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In this paper we show that it is possible to adapt a qudit scheme for creating a controlled- Toffoli 
created by Ralph et al. [Phys. Rev. A 75 011213] to be applicable to qubits. While this scheme 
requires more gates than standard schemes for creating large controlled gates, we show that with 

t-H ■ simple adaptations it is directly equivalent to the standard scheme in the literature. This scheme 

is the most gate-efficient way of creating large controlled unitaries currently known, however it is 

expensive in terms of the number of ancilla qubits used. We go on to show that using a combination 

of these standard techniques presented by Barenco et al. [Phys. Rev. A 52 3457 (1995)] we can 

f> |' create an n-qubit version of the Toffoli using less gates and the same number of ancilla qubits as 

.^^ recent work using computer optimization. This would be useful in any architecture of quantum 

computing where gates are cheap but qubit initialization is expensive. 

ON ■ 

^H ■ 

Making a unitary controlled on other qubits is an essential task for many algorithms in quantum computing |lH3|. 
In this paper we focus on a particular problem, making a unitary which is already controlled on one qubit, controlled 
O on n — 1 further qubits. These highly controlled unitaries (i.e. unitaries controlled on more than one other qubit) 
are useful in numerous quantum algorithms including the oracle in the binary welded tree algorithm [4| and quantum 
simulation [2], |3j. Barenco et al. [fj outlined several techniques to make controlled unitaries in 1995, this work was 
expanded on in Nielsen and Chuang [fj to provide a technique for making a n-qubit version of the CNOT gate using 
14n — 13 operations and n — 2 ancilla. Other work has explored this problem in the context of using computational 
algorithms to optimize circuit layout using the decomposition procedures proposed by Barenco et al. 5] and known 
commutation relations [7l4l2J. However, while these techniques are useful if we have native controlled-square-root-not 
gates, in the case where the only native two qubit operation is a CNOT or C-Phase they perform worse than the 
technique in Nielsen and Chuang [6|] for the same number of ancilla. 

An interesting alternative technique was proposed by Ralph et al. [13j and implemented experimentally by Lanyon 
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lO ! et al. [14( who used additional levels in one of the subsystems of the controlled gate to reduce the overall number of 
operations required. In this work we will show that when converted to qubits this technique is directly equivalent to 

■^- \ the one from Nielsen and Chuang [6[ using the same number of operations, and requiring the same number of ancilla. 
We go on to show that by using the techniques from Nielsen and Chuang [6| combined with other decomposition 
procedures from Barenco et al. [5J] it is possible to reduce the number of ancilla qubits required from n — 2 to 2y/n — 1 
at the expense of double the number of operations. Our new techniques requires less operations for the same number 
of ancilla qubits when compared to existing dec omp osition schemes if we assume both schemes have the ability of 
perform a controlled square root of NOT gate |7H12| . 

For the rest of this work we will represent a CNOT gate controlled on n-qubits as C ra X, and a generally local unitary 
& ■ controlled on n qubits as C"U. Given the ability to perform general local unitaries, and a CNOT gate we can make 
a Toffoli gate using nine local unitaries and six CNOT gates [151 ] . However, if are prepared to accept an approximate 
Toffoli gate, we can use the Margolus gate, which is equivalent to a Toffoligate and a controlled controlled phase. 
This procedure requires three CNOT gates and eight local unitaries (lg, [l7| . An alternative set of decomposition 
procedures are used if we assume the ability to p erform the CV gate in a single operation. Here a standard Toffoli 



uses two CNOT gates and three CV gates [5j, ll2[ . A more efficient decomposition is the Peres gate [llj, [18| , which is 
the equivalent of a Toffoli gate with an additional CNOT. This decomposition uses only one CNOT gate, and three 
CV gates. In this work we can use the more efficient decomposition procedures because our Toffoli gates are arranged 
symmetrically with no other operations in between them. 



Ralph et al. [131 ] and Lanyon et al. LJ] demonstrate that by using a qutrit they can generate a Toffoli gate using 
only three CNOT gates, two standard Pauli X operations, and two implementations of a three-level version of the 
Pauli X operation. One requirement of Lanyon et al. is that the CNOT gates act trivially on the |2) level of the qutrit. 
Replacing the qutrit with two qubits would require each CNOT gate to be replaced with a Toffoli gate as shown in 
fig UJ A qubit version of the Lanyon Toffoli gate is therefore impossible since we would need to consume three Toffoli 
gates to build a single Toffoli gate. 
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FIG. 1: A qubit equivalent of the Lanyon Toffoli, a single Toffoli gate is created but requires the use of three Toffoli gates. All 
lines represent qubits. 
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FIG. 2: A circuit for performing a C 3 X gate (a) using three qubits and a ququart where the ququart system is labelled q. X a 
switches |0) and |2), Xh switches |1) and |3) and the CNOT gates act trivially on levels |2) and |3) of the ququart. (b) using 
five qubits. 



When we consider replacing the qudit with d — 4 (ququart), needed by Lanyon et al. (l4| to generate a C 3 U gate, 
we use five Toffoli gates to generate a controlled Toffoli gate. In fig [2] we show the original circuit proposed by Lanyon 
et al. and our qubit adaptation. Breaking down our Toffoli gates into CNOT gates and general local unitaries means 
we require 53 operations. The standard decomposition by Barenco et al. [5j requires 44 operations. Therefore further 
simplification is needed. 

As our scheme is a direct conversion from a ququart scheme, we have generated a gate sequence where all our 
Toffoli gates must act on both qubits which were formerly part of the ququart. The gates circled in figEJa) leave the 
ancilla qubit in |1) only if qubits three and four are initially in |1), and leave qubit four in |1) only if qubit two, and 
qubit four are in the state |1). The first operation is easy to create using a single Toffoli, but the second operation is 
non-unitary so cannot be created easily without the addition of an ancilla. However, the second expression also has 
a level of redundancy and can be replaced by a Toffoli gate which flips the target qubit only if the ancilla qubit, and 
qubit three are in |1). The result is the circuit shown in fig[3jb), which is directly equivalent to previous results in 
Nielsen and Chuang [fj. In figO^c) we make a small adaptation, adding an additional Toffoli gate, CNOT gate, and 
ancilla qubit. 
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FIG. 3: We can simplify the circuit in (a) to the one illustrated in (b). In (c) additional Toffoli gates are used to make a general 
C 3 U. All the lines represent qubits. 



From simple counting arguments we find that the total number of Toffoli gates required to implement this sequence 
for a gate of the form C n X is 2n — 3 When we limit our gate set to consist of CNOT and local operations we need 
1471 — 13 operations, when we have the ability to perform the CV gate, we only need 8n— II operations to generate our 



C™X gate compared to the Yin — 22 required by Miller et al. |ll|, |12|. We can therefore clearly see that a simplification 
of the Lanyon scheme [lj] is equivalent to the scheme in Nielsen and Chuang [6( and that scheme is more efficient 
than other optimizations for creating large controlled gates. 

However, the scheme provided in Nielsen and Chuang [fj requires a fixed number of ancilla, while more recent 
research llj, Il2| provides techniques for implementing large controlled gates using any number of ancilla qubits. We 



b, 1 

b 2 1 

b 3 1 

64 1 


1 

1 

1 

1 






• 

II — 

II — 

11^— 


be 

67 

5 8 

69 

610 


1 
1 


1 1 

1 1 

1 < 

• 

11 


1 

1 

1 




611 











l(T C 




<J 






= 10) * 6} 



c = 3 

FIG. 4: Creating a large Toffoli gate from several smaller Toffoli gates, this uses two ancilla qubits which start in the state |0) 
and makes a Toffoli which is controlled on 11 qubits. Additional ancilla qubits will be required to create the shown gates. 

therefore want to look at how to minimize the number of ancilla qubits required to generate a large controlled unitary 
using initialized ancilla. To do this we use the identity in Barenco et al. [5( which combines two copies of C* X and 
two copies of C m X to create C' +m X using only a single ancilla qubit. In fig 0] we show how we can use this identity 
to reduce the number of ancilla qubits used to generate a large controlled unitary. Generating one multiple qubit 
controlled gate can be considered one cycle. Our procedure consists of 2c — 1 cycles, where the first c cycles are used 
to flip our target only if all the control qubits are in |1) and the other c — 1 cycles are used to return all the ancilla 
qubits to |0) 

The first c cycles of our process flip the target qubit, but also return all but c of our qubits to their initial state. 
This set of cycles therefore requires a total of 2n — 2 — c Toffoli gates. The average number of Toffoli gates per cycle 
is therefore 



N c (n,c) = 



2n 



(1) 



Given we require 2c — 1 cycles the total number of Toffoli gates for creating an n qubit controlled unitary using 2c — 1 
cycles, N t (n 7 c), is 



iV t (n,c) = (2c-l) 



2n - 2 - c 



2n(2c-l)-c(3 + 2c) + 2 



(2) 



We take the floor function here, because we can always chose to have the shortest cycles as the ones we repeat twice. 
We use an ancilla qubit as the target of all but one of our multiple qubit controlled gates, therefore we need c — 1 
ancilla qubits to act as 'cycle' qubits, which are not reused between cycles. Process ancilla qubits will be used to 
create the multiple control gates, these will be reused in each cycle. We need n — 1 Toffoli gates to flip our target 
qubit, c of these will have a cycle ancilla as a target therefore n — 1 — c will need a process ancilla as a target. The 
Toffoli gates are equally divided between the cycles therefore the total number of ancilla qubits required is given by 



N a {n,c) 



n — 1 — c 



1 



1 



(3) 



We take the ceiling function here because we need enough ancilla qubits for the longest cycle. To find the minimum 
number of ancilla qubits we differentiate equation ([3]) without the ceiling function, giving c = y/n — 1. We take 
c = [\/n — lj to keep the number of operations as small as possible. Therefore we require 



N a (n, LV^TJ) 



n-1 



+ y^~l\ - 1 « 2Vr. 



1 -1 



(4) 



ancilla qubits. This gives us a quadratic reduction in the number of ancilla qubits required. However as n becomes 
large, the number of operations required to achieve this minimum tends to 



N t (n, y/n- 1) = 4(n - y/n) 



(5) 
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TABLE I: A comparison between the number of gates we require to make a gate of the form C™X compared to those required 
by Miller et al. [12fl . Both schemes use the Peres gate to break up the Toffoli operations. 

meaning that almost double the number of operations are required to get this reduction in the consumption of ancilla 
qubits. For general n and c = \yn — lj the total number of operations assuming the ability to perform our controlled 
gate using CV is given by 



N g (n,\/n- 1) =4 [An 



2(ra-l) 



2l-Jn~l\ -3] + 2{Vn~l\ 



(G) 



This equation takes into account we need four operations to perform the majority of our Toffoli gates, while 2c— 1 of 
our Toffoli gates require five operations. When n — N a > 5 then Miller et al. [12j require a total number of operations 
given by 



N gM = 24n - 64 - 12 



n- 1 



- 12 [Vn~ 1\ 



(7) 



for the same number of ancilla qubits as we need [17]. We therefore expect to use less operations than Miller et al. [12[ 
provided n > 10, however since the formula we derived from the work of Miller et al. is only accurate if n — N a > 5 
then our comparison is only accurate if n > 10 so we could see an improvement for lower values of n. 

In table U we show that we require fewer operations than Miller et al. 12j for the same number of ancilla qubits 
for all n except n = 5 and n = 6. In general, as n becomes larger the improvement we show over Miller et al. also 
becomes larger. We can also compare our results with those obtained by Barenco et al. who use a system of only two 
cycles [5]. In this case we need 3(ro — 2) Toffoli gates compared to the 8(n — 5) required by Barenco et al. 5]. Both 
schemes will use the same number of ancilla qubits. This shows the significant advantage of initialising the ancilla 
qubits. 

In this paper we showed that the qubit equivalent of the qudit schemes proposed by Ralph et al. [13l j and Lanyon 
et al. [14j is directly equivalent to the scheme given in Nielsen and Chaung 6] . Simple counting arguments show that 
this is currently the most efficient way to generate large controlled gates, although it is possible that optimization 
techniques used in other work |7H12| could also be used in this scenario to get further reductions. We can reduce 
the number of ancilla qubits required by our system by creating several large Toffoli gates, then combine them to 
form one even larger Toffoli gate. This adaptation can double the number of operations but can give us a quadratic 
reduction in the number of ancilla qubits required. The minimum number of ancilla qubits required by our scheme 
to produce a C n X gate is given by 



iV (min) 



ri -1 
LV^IJ 



LV^IJ -i 



(8) 



For large n this would require 4(n — y/n) Toffoli operations, roughly double the number needed if we use n — 2 ancilla 
qubits. We therefore see a trade off between the number of ancilla qubits and the total number of operations required. 
When we reduce the number of ancilla qubits, we still require fewer total operations than needed by Miller except 
when n = 5 or n = 6, this comparison is shown in table |U It is worth noting that it might be possible to obtain further 
improvements using the automated searching techniques provided in these papers. 

We show that using initialized ancilla it is possible to get a saving in operations over alternative techniques for 
both high and low number of ancilla. This work clearly shows the advantages of using initialized ancilla for creating 
large controlled unitaires. The initialization of ancilla qubits is essential for quantum error correction and is generally 
considered a relatively trivial procedure. The one disadvantage of this scheme is there is a limit to how far we can 
reduce the number of ancilla, and we show that the minimum number of ancilla required is roughly 2\/n — 1 — 1. 
We hope that it would be possible to obtain further imp rovements in the number of operations required using the 
optimization techniques discussed in previous work [7H12J| . 
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